Systematic AI-Assisted Literature Review - Sessions 1 & 2
October 1, 2025
Our focus today: Scientific paper summary and extraction
Key insight: This allows us to craft the right context we want to provide to AI for literature review synthesis
From researcher feedback, common challenges include:
Today’s goal: Build a systematic, AI-assisted approach to paper processing
The challenge: Understanding your input data (scientific papers) and how your AI pipeline behaves on that data at scale.
In literature review context:
Critical insight: We need systematic ways to understand our data and AI performance at scale
The challenge: The gap between what you want the AI to do and what you actually communicate in your prompts.
Example: “Summarize the key findings” leaves many questions unanswered:
Key point: Seemingly clear instructions often contain hidden ambiguities
The challenge: Even with perfect prompts, AI may behave inconsistently across different inputs.
In scientific literature:
Important: Each application requires bridging the Three Gulfs anew - there are no universal solutions
Understanding the Three Gulfs helps us:
Today’s structure: We’ll address each gulf systematically through hands-on practice
Do you have:
Why papers you know well? You’ll be able to evaluate AI output quality and catch errors more easily.
Tool choice: We’ll be tool-agnostic - methodology matters more than specific platforms
Your first prompt: Try this with one of your papers:
“Please summarize this scientific paper.”
Time: 10 minutes
While you work:
Open coding concept: Like “casting a fishing net” - capture whatever comes up without predefined categories.
From social science and LLM evaluation methodology:
Share with the group: What did you observe in your AI summaries?
Based on the group’s observations, common patterns include:
This illustrates: Gulf of Specification - our simple prompt left too much undefined
A well-structured prompt includes:
Enhanced prompt: Try this version:
Role: You are a scientific literature extraction specialist with expertise in [your domain] research.
Objective: Generate a comprehensive, structured summary of the provided scientific paper optimized for literature review preparation.
[Your paper text here]
Time: 10-15 minutes
Continue open coding: Note changes compared to Exercise 1
Further enhanced prompt:
Role: You are a scientific literature extraction specialist…
Objective: Generate a comprehensive, structured summary…
Format required: - Title and citation information - Abstract (original) - Key takeaways by section (bullet points) - Figures and tables described
[Your paper text here]
Time: 10-15 minutes
Group discussion:
Pattern emerging: More specific prompts → more consistent outputs, but new challenges may appear
Use the comprehensive prompt from your materials:
Time: 15-20 minutes
Focus: How does comprehensive specification affect consistency and quality?
It’s very tempting to ask an LLM to write prompts for us upfront.
Why we iterate manually first:
Later: We can use AI to refine prompts, but start with human-led iteration
Available tools:
Best practice: Use these tools in hybrid setups after you understand your requirements
For tomorrow: We’ll dive deeper into evaluation methodology and systematic failure analysis
Try the complete prompt with 2-3 different papers:
Session 1 recap: We started addressing the Gulf of Specification through iterative prompting.
Today’s focus:
Share with the group:
Add to your open coding notes - we’ll use these observations for systematic analysis
Open coding (what we’ve been doing): Collecting observations without predefined categories
Axial coding (today’s focus): Organizing observations into meaningful patterns and categories
The process:
Group activity:
Time: 20-25 minutes
Tool: We’ll use a collaborative form/board to organize our findings
Based on collective analysis, common categories include:
Content Issues: - Missing key information (methodology, results, figures) - Inconsistent level of detail across papers - Misinterpretation of complex concepts
Structural Problems: - Inconsistent formatting despite clear instructions - Poor cross-referencing between sections and figures - Variable summary lengths
Domain-Specific Challenges: - Technical terminology handling - Different paper structures across subfields
Concept: Use AI to evaluate AI outputs systematically
Why this works:
Caution: Still requires human oversight and validation of evaluation criteria
Based on our axial coding results:
Time: 25-30 minutes
Completeness: - All major sections summarized - Figures and tables described - Key quantitative results included
Accuracy: - No factual errors or misrepresentations - Proper technical terminology usage - Correct interpretation of results
Consistency: - Uniform formatting across papers - Consistent level of detail - Reliable cross-referencing
Relevance: - Focus on research-relevant content - Appropriate level of methodological detail
Key principle: Each iteration should target specific, identified failure modes
Your task:
Time: 20-25 minutes
Moving beyond single papers:
Next challenge: Using these summaries as context for literature review synthesis
What we’ve achieved:
Ongoing needs:
What we’ve built:
Remember: Specification is never “done” - it evolves with your understanding
What we’ve learned:
Use of AI for Literature Review course, IAEA Laboratories, Seibersdorf, Austria, September 2025.